Multiword expressions in spoken language: An exploratory study on pronunciation variation

نویسندگان

Diana Binnenpoorte

Catia Cucchiarini

Lou Boves

Helmer Strik

چکیده

The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. These N-grams were filtered and subsequently assigned to linguistic categories. For a small selection of these N-grams we examined the phonetic transcriptions contained in the corpus. We found that the pronunciation of these N-grams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analysed the pronunciations of the individual words composing the N-grams in two context conditions: (1) in the N-gram context and (2) in any other context. We found that words in Ngrams do indeed have peculiar pronunciation patterns. This seems to suggest that the N-grams investigated may be considered as MWEs that should be treated as lexical entries in the pronunciation lexicons used in ASR and APT, with their own specific pronunciation variants. 2005 Elsevier Ltd. All rights reserved. 0885-2308/$ see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.csl.2004.11.003 * Corresponding author. Tel.: +31 24 36 12 908; fax: +31 24 36 12 907. E-mail address: [email protected] (D. Binnenpoorte). 434 D. Binnenpoorte et al. / Computer Speech and Language 19 (2005) 433–449

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing and identifying multiword expressions in spoken language

The present paper investigates multiword expressions (MWEs) in spo ken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs ...

متن کامل

Multiword expressions in spontaneous speech: do we really speak like that?

In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these Ngrams phonetic transcriptions were available, which were examined. Our results show that the pronunciation ...

متن کامل

Pragmatic expressions in cross-linguistic perspective

This paper focuses on some pragmatic expressions that are characteristic of informal spoken English, their possible equivalents in some other languages, and their use by EFL learners from different backgrounds. These expressions, called general extenders (e.g. and stuff, or something), are shown to be different from discourse markers and to exhibit variation in form, funct...

متن کامل

Gender in everyday speech and language: a corpus-based study

This paper presents an exploratory study on the relations between gender and everyday parlance. A “data-mining” approach is used to explore gender-specific characteristics in a large number of spontaneous telephone and face-to-face conversations. Our study focuses on speech rate (speaking rate and articulation rate), disfluencies (filled pauses and repetitions), pronunciation variation (phoneme...

متن کامل

A Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers

This paper will focus on the study of the variation of co-occurrence patterns encountered in written and spoken registers, through the analysis of a large lexical database of corpus-extracted multiword expressions (MWEs) of European Portuguese. Those MWEs were automatically extracted from a balanced 50 million word written corpus and a 1 million word spoken corpus, furthermore statistically int...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Computer Speech & Language

دوره 19 شماره

صفحات -

تاریخ انتشار 2005

Multiword expressions in spoken language: An exploratory study on pronunciation variation

نویسندگان

چکیده

منابع مشابه

Analyzing and identifying multiword expressions in spoken language

Multiword expressions in spontaneous speech: do we really speak like that?

Pragmatic expressions in cross-linguistic perspective

Gender in everyday speech and language: a corpus-based study

A Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers

عنوان ژورنال:

اشتراک گذاری